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T he fa mily of concurrent.logic programming languages 
Ehud Shapiro 

September 1989 ACM Computing Surveys (CSUR), Volume 21 issue 3 
Publisher: ACM Press 

Full text available: ||J.pdf(9 Additional Information: Mi citation, abstract, reierences, citings, index, terrns 

Concurrent logic languages are high-level programming languages for parallel and distributed 
systems that offer a wide range of both known and novel concurrent programming techniques. 
Being logic programming languages, they preserve many advantages of the abstract logic 
programming model, including the logical reading of programs and computations, the convenier 
of representing data structures with logical terms and manipulating them using unification, and 
amenability to metaprogrammin ... 



System-level power optimization: techniques and toois 
Luca Benini, Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), voiun 

Issue 2 

Publisher: ACM Press 

Full text available: 'Q pdf('3B5.Z? KB) Additional Information: full citation , abstract, references, citings, index terms 

This tutorial surveys design methods for energy-efficient system-level design. We consider 
electronic sytems consisting of a hardware platform and software layers. We consider the three 
major constituents of hardware that consume energy, namely computation, communication, am 
storage units, and we review methods of reducing their energy consumption. We also study 
models for analyzing the energy cost of software, and methods for energy-efficient software 
design and compilation. This survery ... 

3 ParaJiei.executign 

Gopal Gupta, Enrico Pontelli, Khayri A.M. AM, Mats Carlsson, Manuel V. Hermenegildo 

July 2001 ACM Transactions on Programming Languages and Systems (TOPLAS), volume 2: 

Issue 4 

Publisher: ACM Press 

Full text available: *g ] pcif(1.95 MB) Additional Information: full citation , abstract , references , citings , Index terms 

Since the early days of logic programming, researchers in the field realized the potential for 
exploitation of parallelism present in the execution of logic programs. Their high-level nature, t\ 
presence of nondeterminism, and their referential transparency, among other characteristics, 
make logic programs interesting candidates for obtaining speedups through parallel execution. > 
the same time, the fact that the typical applications of logic programming frequently involve 
irregular computatio ... 



Keywords: Automatic parallelization, constraint programming, logic programming, parallelism, 
prolog 



Pipeline Architecture 

C. V. Ramamoorthy, H. F. Li 

January 1977 ACM Computing Surveys (CSUR), Volume 9 issue 1 
Publisher: ACM Press 

Full text available: "P ]pciff3 53 MB) Additional Information: full citation, references, citings, index terms 
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On randomization in sequential and distributed aigorithms 
Rajiv Gupta, Scott A. Smolka, Shaji Bhaskar 
March 1994 ACM Computing Surveys (CSUR), volume 26 issue l 
Publisher: ACM Press 

Full text available: 'jl j pdff8.()i MB) Additional Information: full citation , abstract, references , citings, index terms 

Probabilistic, or randomized, algorithms are fast becoming as commonplace as conventional 
deterministic algorithms. This survey presents five techniques that have been widely used in th< 
design of randomized algorithms. These techniques are illustrated using 12 randomized 
algorithms— both sequential and distributed— that span a wide range of applications, 
including :primality testing (a classical problem in number theory), interactive probabilistic proof 
s ... 

Keywords: Byzantine agreement, CSP, analysis of algorithms, computational complexity, dinin 
philosophers problem, distributed algorithms, graph isomorphism, hashing, interactive probabili 
proof systems, leader election, message routing, nearest-neighbors problem, perfect hashing, 
primality testing, probabilistic techniques, randomized or probabilistic algorithms, randomized 
quicksort, sequential algorithms, transitive tournaments, universal hashing 



Compiler-based I/O prefetching for out-of-core appiications 
Angela Demke Brown, Todd C. Mowry, Orran Krieger 

May 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 2 
Publisher: ACM Press 

Full text available 1 pflft499 03 KB) Additional Information: Ml citation, abstract, references, citings, Indexierms, 
' ^ vvv:> % ' f review 

Current operating systems offer poor performance when a numeric application's working set do< 
not fit in main memory. As a result, programmers who wish to solve "out-of-core" problems 
efficiently are typically faced with the onerous task of rewriting an application to use explicit I/C 
operations (e.g., read/write). In this paper, we propose and evaluate a fully automatic techniqu 
which liberates the programmer from this task, provides high performance, and requires only 
minima ... 

Keywords: compiler optimization, prefetching, virtual memory 



Distributed systems - programming and management: On remote procedure call 
Patricia Gomes Soares 

November 1992 Proceedings of the 1992 conference of the Centre for Advanced Studies on 

Collaborative research - Volume 2 
Publisher: IBM Press 

Full text available: ^.pdf(452.MBj Additional Information: fuN citation, abstract, references, citings 

The Remote Procedure Call (RPC) paradigm is reviewed. The concept is described, along with th 
backbone structure of the mechanisms that support it. An overview of works in supporting thes< 
mechanisms is discussed. Extensions to the paradigm that have been proposed to enlarge its 
suitability, are studied. The main contributions of this paper are a standard view and classificati- 
of RPC mechanisms according to different perspectives, and a snapshot of the paradigm in use 
today and of goals for t ... 

Abstract state machines capture parallel algorithms 
Andreas Blass, Yuri Gurevich 

October 2003 ACM Transactions on Computational Logic (TOCL), volume 4 issue 4 
Publisher: ACM Press 



Full text available: 



Additional Information: full citation, abstract, references, index terms 



We give an axiomatic description of parallel, synchronous algorithms. Our main result is that ev 
such algorithm can be simulated, step for step, by an abstract state machine with a backgrounc 
that provides for multisets. 

Keywords: ASM thesis, Parallel algorithm, abstract state machine, postulates for parallel 
computation 



SODA: a simplified operating system for distributed applications 
Jonathan Kepecs, Marvin Solomon 

October 1985 ACM SIGOPS Operating Systems Review, volume 19 issue 4 
Publisher: ACM Press 

Full text available: f| )pdf(1 05 MB) Additional Information: full citation , references, citings 



10 S ODA: A simpli fied o perating system for distribute d applications 

Jonathan Kepecs, Marvin Solomon 
^W. August 1984 Proceedings of the third annual ACM symposium on Principles of distributed 
computing 

Publisher: ACM Press 

Full text available: || | pdf(925.16 KB) Additional Information: full citation, abstract, references, citings , index terms 

The design and implementation study discussed in this paper can be viewed in two ways. On on 
hand, it represents a contribution to the active area of design of "smart" communications 
controllers which use increasingly sophisticated processor/memory configurations to improve th 
performance of interprocessor communication. On the other hand, it represents an application c 
the minimalist principles of the RISC (Reduced Instruction-Set Computer) architecture [1] to 
operating syst ... 



11 PaMteiJ^M 

recognition. 
™ Patrick W. Dymond, Walter L. Ruzzo 

January 2000 Journal of the ACM (JACM), volume 47 issue 1 

Publisher: ACM Press 

Full text available: ^ pdf(223.64 KB) Additional Information: full citation, abstract, references, indexterras, review 

We identify and study a natural and frequently occurring subclass of Concurrent Read, Exclusive 
Write Parallel Random Access Machines (CREW-PRAMs). Called Concurrent Read, Owner Write, 1 
CROW-PRAMS, these are machines in which each global memory location is assigned a unique 
"owner" processor, which is the only processor allowed to write into it. Considering the difficult 
that would be involved in physically realizinga full CREW-PRAM model and demonstrate i ... 

Keywords: CROW-PRAM, DCFL recognition, owner write, parallel algorithms 



A Survey of gome Theoretical Aspects of Multiprocess 

■: J. L. Baer 

January 1973 ACM Computing Surveys (CSUR), volume 5 issue 1 
Publisher: ACM Press 

Full text available: ^pM4,.05.MBJ Additional Information: full.c!tatjon, references, dtlngs. index terms 



13 Techniques for reducing consistency-related communication in distributed shared-memory 
& systems 

^ John B. Carter, John K. Bennett, Willy Zwaenepoel 

August 1995 ACM Transactions on Computer Systems (TOCS), Volume 13 issue 3 
Publisher: ACM Press 

Full text available- 1i!pdf(2 86 MB* Additional Information: fuHcitatioDa abstract, references, citings, indexierms, 

* ™" ' " * review 



Distributed shared memory (DSM) is an abstraction of shared memory on a distributed-memorv 
machine. Hardware DSM systems support this abstraction at the architecture level; software DS 
systems support the abstraction within the runtime system. One of the key problems in building 
efficient software DSM system is to reduce the amount of communication needed to keep the 
distributed memories consistent. In this article we present four techniques for doing so: softwar 
release consistency; m ... 

Keywords: cache consistency protocols, distributed shared memory, memory models, release 
consistency, virtual shared memory 



1 4 Experience with transactions in Quicksilver 
Frank Schmuck, Jim Wylie 

September 1991 ACM SIGOPS Operating Systems Review , Proceedings of the thirteenth AC 

symposium on Operating systems principles, volume 25 issue 5 
Publisher: ACM Press , ACM Press 

Full text available: ^ ptif(1 66 MB) Additional Information: full citation , abstract, references, citings, Index terms 

All programs in the Quicksilver distributed system behave atomically with respect to their updal 
to permanent data. Operating system support for transactions provides the framework required 
support this, as well as a mechanism that unifies reclamation of resources after failures or nornr 
process termination. This paper evaluates the use of transactions for these purposes in a genen 
purpose operating system and presents some of the lessons learned from our experience with a 
complet ... 

1 5 Adaptive ..History-Based Memory Schedulers 
Ibrahim Hur, Calvin Lin 

December 2004 Proceedings of the 37th annual International Symposium on 

Microarchitecture 
Publisher: IEEE Computer Society 

Full text available: pcif(220.26 KB) Additional Information: full citation , abstract 

As memory performance becomes increasingly important to overall system performance, the ne 
to carefully schedule memory operations also increases. This paper presents a new approach to 
memory scheduling that considers the history of recently scheduled operations. This history-bas 
approach provides two conceptual advantages: (1) it allows the scheduler to better reason aboi 
the delays associated with its scheduling decisions, and (2) it allows the scheduler to select 
operations so that they ... 

16 DatarDrjyen.and.D 
Philip C. Treieaven, David R. Brownbridge, Richard P. Hopkins 
January 1982 ACM Computing Surveys (CSUR), volume 14 issue l 

Publisher: ACM Press 

Full text available: ffipdfi[4.14MB) Additional Information: fujj.cjtation, references, citings, index terras 
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May 1990 ACM Transactions on Computer Systems (TOCS), volume 8 issue 2 
Publisher: ACM Press 

Full text available* ^.pdf(3 83 MB* Additional Information: fyli.cltatjon, abstract, Merences, cjtin^s, indexiemris, 



Application programs written for large-scale multicomputers with interconnection structures kno 
to the programmer (e.g., hypercubes or meshes) use complex communication structures for 
connecting the applications' parallel tasks. Such structures implement a wide variety of function 
including the exchange of data or control information relevant to the task computations and/or 
communications required for task synchronization, message forwarding/filtering under program 
control, and so o ... 
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Thomas A. Henzinger, Christoph M. Kirsch 

May 2002 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2002 Conference on 




Programming language design and implementation, volume 37 issue 5 
Publisher: ACM Press , ACM Press 



Full text available: ^pdf(223.65 KB) Additional Information: full citation, abstract, references , citings, index terms 

The Embedded Machine is a virtual machine that mediates in real time the interaction between 
software processes and physical processes. It separates the compilation of embedded programs 
into two phases. The first, platform-independent compiler phase generates E code (code execut 
by the Embedded Machine), which supervises the timing —not the scheduling— of application 
tasks relative to external events, such as clock ticks and sensor interrupts. Encode is portable a 
exhibits, given an inpu ... 

Keywords: real time, virtual machine 



19 Comparative evaluation of latency reducing and tolerating techniques 

Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, Wolf-Dietrich Weber 

V April 1991 ACM SIGARCH Computer Architecture News , Proceedings of the 18th annual 
international symposium on Computer architecture, volume 19 issue 3 
Publisher: ACM Press , ACM Press 

Full text available: "P] pdf(1.36 MB) Additional Information: full citation, references, citings , index terms 



20 Cache Refill/Access Decoupiing for Vector Machines 

Christopher Batten, Ronny Krashinsky, Steve Gerding, Krste Asanovic 

December 2004 Proceedings of the 37th annual International Symposium on 

Microarchitecture 
Publisher: IEEE Computer Society 

Full text available: ||) pdf(319.32 KB) Additional Information: full citation, abstract 

Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth 
demands, but then require expensive logic to track large numbers of outstanding cache misses 
sustain peak bandwidth from memory. We present refill/access decoupling, which augments the 
vector processor with a Vector Refill Unit (VRU) to quickly pre-execute vector memory commam 
and issue any needed cache line refills ahead of regular execution. The VRU reduces costs by 
eliminating much of the outstan ... 
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